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Abstract 

Using a set of 28 high resolution, high signal to noise ratio (S/N) QSO Lya absorption spectra, we 
investigate the non-Gaussian features of the transmitted flux fluctuations, and their effect upon the power 
spectrum of this field. We find that the spatial distribution of the local power of the transmitted flux on 
scales k > 0.05 s/km is highly spiky or intermittent. The probability distribution functions (PDFs) of 
the local power are long-tailed. The power on small scales is dominated by small probability events, 
and consequently, the uncertainty in the power spectrum of the transmitted flux field is generally large. 
This uncertainty arises due to the slow convergence of an intermittent field to a Gaussian limit required 
by the central limit theorem (CLT). To reduce this uncertainty, it is common to estimate the error of the 
power spectrum by selecting subsamples with an "optimal" size. We show that this conventional method 
actually does not calculate the variance of the original intermittent field but of a Gaussian field. Based on 
the analysis of intermittency, we propose an algorithm to calculate the error. It is based on a bootstrap re- 
sampling among all independent local power modes. This estimation doesn't require any extra parameter 
like the size of the subsamples, and is sensitive to the intermittency of the fields. This method effectively 
reduces the uncertainty in the power spectrum when the number of independent modes matches the 
condition of the CLT convergence. 

1 Introduction 

Lya absorption on the short wavelength side of Lya emission in QSO spectra with high redshift indicates 
the presence of neutral hydrogen (HI) along the line of sight. The low column density (10 13 to 10 17 cm -2 ) 
HI absorption is believed to be due to diffusely distributed intergalactic medium (IGM). The IGM mass field 
is passive with respect to the dark matter present in the sense that its gravitational clustering is dominated 
by the gravity of dark matter. The IGM is assumed to be a good tracer of the underlying mass density 
distribution on scales larger than the thermal diffusion or Jeans length in the linear or even the nonlinear 
regime (Bi, 1993; Fang et al. 1993; Hui, & Gnedin 1997; Nusser, & Haehnelt 1999). The statistical features 
of the QSO Lya transmitted flux can be used to estimate the corresponding features of the underlying mass 
field. Consequently, the power spectrum of the transmitted flux fluctuations of high redshift QSOs' Lya 
absorption spectrum has been extensively used for reconstructing the initially linear perturbations of the 
cosmic mass and velocity fields and discriminating among models of structure formation (Bi, Ge, & Fang 
1995; Bi, & Davidsen 1997; Croft et al. 1999, 2002; Feng, & Fang 2000; McDonald et al 2000; Pando, 
Feng & Fang 2001 ; Zhan, & Fang 2002). 

Therefore, it is necessary to study the effects of the non-Gaussian features of the flux field upon the 
detection of the power spectrum of this field, and its error estimation. In this paper, we will focus on the 
effects of intermittency of the QSO Lya transmitted flux on the power spectrum estimation. 
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The first indication of intermittency arises from the success of the lognormal model in explaining the 
QSO Lya forests (Bi 1993; Bi, & Davidsen 1997) as a lognormal field typically is intermittent. To approx- 
imately describe the evolution of the IGM by a stochastic Burgers' equation, the IGM mass and velocity 
fields are found to be lognormal or intermittent (Jones, 1999; Matarrese, & Mohayaee 2002). Intermittency 
has been detected with high resolution, high signal to noise ratio (S/N) samples of QSO Lya absorption 
spectra (Jamkhedkar, Zhan, & Fang 2000; Feng, Pando, & Fang 2001; Zhan, Jamkhedkar, & Fang 2001; 
Jamkhedkar 2002). Recently, the intermittent features have been further examined by measuring the struc- 
ture function and intermittent exponent of 28 Keck HIRES QSO spectra (Pando et al. 2002). This work 
shows that the intermittent behaviour is significant on scales k > 0.05 s/km. 

A basic characteristic of an intermittent field is that the power of the fluctuations of the flux field is 
concentrated in rare spikes which are randomly and widely scattered in space, with low power between 
the spikes. In this case, the power spectrum of an intermittent field is dominated by rare and improbable 
events (spikes). In other words, only a small fraction of the total modes contribute to the power while 
most are inactive. Although an intermittent field is statistically homogeneous, the rare events lead to 
significant differences among samples from different parts of the universe when the spatial size of the region 
is not large enough to contain numerous spikes. Mathematically, this makes the central limit theorem less 
effective. All these imply that the uncertainty in the power spectrum is directly related to the intermittency 
of the field. 

Using the same samples studied by Pando et al. (2002), we will investigate the uncertainty of the power 
spectrum of QSO Lya transmitted flux fluctuations. The paper is organised as follows. §2 briefly introduces 
the basic properties of an intermittent field. §3 presents our algorithm for calculating the power spectra with 
the discrete wavelet transform (DWT). §4 describes the samples used for analysis. §5 discusses the basic 
properties of the DWT power spectrum of the samples. §6 addresses the effect of the intermittent behaviour 
upon the power spectrum. Finally, §7 summarises the results and conclusions. 



2 Intermittent field 

Most of this section has been presented in our previous work (Pando et al 2002). For the paper to be self- 
contained, we repeat these results very briefly. The principal characteristic of an intermittent random field, 
like Lya transmitted flux F(X), is measured by the asymptotic behaviour of the ratio between the high- and 
low-order moments of the field defined as 

([F(x + r)-F(x)} 2 ") ^/r\t (J) 



[([F(x + r)-F(x)]i)]"~\L 

where (...) is the average over the ensemble of fields, and L is the size of the sample. Eq. (1) can be 
rewritten as 

§-(z) C - < 2 > 

where S 2n is the structure function defined by 

S? = (\A r (x)\ 2 "). (3) 

Here A r (x) =F{x + r) —F (x) . S 2n is the 2« th moment of the fluctuations of the field. Therefore, the ratio 
in eq. (2) is the 2n th moment S 2n normalised by the « th power of the 2 nd moment, or power, S 2 . 

A field is intermittent if the exponent £ is negative on small scales r (Gartner, & Molchanov, 1990; 
Zel'dovich, Ruzmaikin, & Sokoloff, 1990). Intermittency is measured by the n- and r-dependencies of 
For a Gaussian field, 

#=<*•-»"• m 

This ratio is independent of the scale r, and therefore, the intermittent exponent £ = 0. Thus, a Gaussian 
field is not intermittent. Moreover, not all non-Gaussian fields are intermittent. Only fields, for which the 
ratio in eq. (1) diverges as r — > 0, are intermittent. 
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For a lognormal field, the probability distribution function (PDF) of A 2 (x) is 

^wj- 2 i/2 Jt i / 2 A 2 (x)a(r) ex p| 2^ CT(r) ) y & 

where A 2 (x) m is the median of A 2 (x) (Vanmarcke 1983). The variance a(r) of lnA 2 (x) may be a function 
of the scale r. Using eq. (5), we have 

= e (n 2 -«)a 2 ( r )/2_ (6) 



The intermittent exponent of a lognormal field is then 

C^(« 2 -«)a 2 (r)/21n(r/L). (7) 

Because r < L, £ is negative. Therefore, a lognormal field is intermittent. 

In comparison with the Gaussian result (eq. (4)), the ratio of eq. (6) increases faster with n. This is 
due to the fact that a lognormal PDF is long-tailed, long-tailed PDF is a common property of intermittent 
fields. The property of S 2n 3> [S 2 ]" on small scales indicates that the field contains "abnormal" events of 
large density fluctuations |A r (x)|. The events of big |A r (x)| correspond to a sharp increase or decrease of 
the field, and constitute the long-tail of the PDF of |A r (x)|. 



3 DWT algorithm of power spectrum of Lya transmitted flux 
3.1 The field of Lya transmitted flux fluctuations 

The transmitted flux of a QSO absorption spectrum is given by F(X) = F c {X)e~ x ^\ where F C (X) is the 
continuum, e _x W the transmission, and x(X) the opacity. Since the observed data (§4) is already reduced 
by a continuum fitting, we have 

F{X)=e- x{X) +n{X), (8) 

where the term n(X) describes the stochastic noise including the Poisson noise of photon count. It satisfies 
the statistical properties 

(n(X)} = 0, (n(X)n(X'))=a 2 (X)8l x „ (9) 

where 8^ is the Kronecker Delta function. The ensemble average is over the density fluctuations. There- 
fore, we have (F(X)) = (e~ x< ^). Generally speaking, (F(X)) is still position-dependent, as it depends 
on the photoionization rate of HI, and the temperature of the IGM. Moreover, for a given scale r, all 
fluctuations on scales larger than r around the position X act as a background for this fluctuation. This 
leads to a position dependent background. If one can assume that (F(X)) is a constant, then we have 
(F(X)) = (e-< x )}, which is the mean transmission. 

The field of Lya transmitted flux fluctuations are defined by 



F(X)-(F(X)) 

5 (X) (F(X)) ' (10) 

or 

8(X)=F(X)-(F(X)). (11) 

8" (A.) and 8(A.) are different by a normalisation factor (F(X)). If (F(X)) is X-independent, the unnormalised 
and normalised fields 5"(X) and 8(A.) differ only by a constant. If {F(X)) 2 doesn't contain small scale 
fluctuations, then one still can treat (F(X)) 2 as a constant when studying the fields on small scales. 
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3.2 DWT variables of Lyoc transmitted flux field 



In order to easily illustrate the effect of intermittency on the power spectrum, we calculate the power 
spectrum of Lya transmitted flux with the discrete wavelet transform (DWT). 

We use x\ and X2 to denote the spatial range of a flux field corresponding to the wavelength range from 
A-i and hi- To implement a DWT scale-space decomposition of a flux field F(x), we first chop the spatial 
range L = X2 — x\ of the 1-D sample into 2' subintervals labelled with I = 0, ...,2 J : — 1. Each subinterval 
spans a spatial range L/2A The subinterval / is fromxi + LI/2-' to x\ +L(l + 1)/2 J . The index j can be an 
integer. Thus, we decompose the space L into cells (j, I), where j denotes the scale L/2>, and I the spatial 
range [x t +Ll/V,x x +L(l + 

Corresponding to each cell (or mode) (j, I) , there is a scaling function fyji(x), and a wavelet function 
\\tji(x), which are the orthogonal and complete basis for the scale-space decomposition. The most important 
property of the DWT basis is its locality in both scale and physical (or redshift) spaces. The scaling function 
§jt(x) is a window function on scale j and at the position /. The wavelet function \]fji(x) is admissible 
(Daubechies 1992), i.e., / \\tj/(x)dx = 0. Therefore, it measures the fluctuation on scale j and at position /. 
All wavelets with compactly supported bases will produce similar results. We will use Daubechies 4 (D4) 
basis below, for its ease in numerical calculations among the compactly supported orthogonal bases. The 
D4 scaling function wavelet yiji(x) and their Fourier transforms §ji(n), fyji{n) are shown in Yang et 

al (2001b). 

Subjecting a transmitted flux F{x) to the DWT, we have (Daubechies 1992; Fang, & Thews 1998) 

F(x) = 2 £44>,7« + I "Z&fWfl (*). (12) 

1=0 j'=j 1=0 

where J is given by the finest scale (pixel resolution) of the sample, i.e., Ax = L/2 J and j is the scale we 
want to study. The field now is described by DWT variables and 8^, which are called scaling function 
coefficient (SFC) and wavelet function coefficient (WFC), respectively. 
The SFC is given by projecting F(x) onto 

ej, = j 'F(x)tyi(x)dx. (13) 

The SFC ej; is proportional to the mean field at the mode The WFC is obtained by projecting F(x) 

onto\|/ ;7 (x) 

ej, = j F{x)v/ji{x)dx. (14) 

The WFC is basically the difference between F(x) and F(x + r), where x ~ x\ + lL/2 j and r ~ L/2K 
Thus, the WFC can be used to replace the variable F(x + r) - F(x). Eq. (3) can then be rewritten as 

sfHI^I 2 "). (15) 

If the "fair sample hypothesis" (Peebles 1980) holds, one can replace the ensemble average in eq. (15) by 
a spatial average. We then have 

s ) n = Tj L 't\^i\ 2n - d 6 ) 

L 1=0 

3.3 Power spectra of Lya transmitted flux 

For a transmitted flux fluctuation field 8"(x), the two point correlation function is defined by 

= <S»(xi)S»te)>, (17) 

where Ax = X2 — x\. Similarly, one can define a two point correlation function of the unnormalised field 

8(x) as 

S(Ax) = <8(jci)8(j(2)). (18) 
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The Fourier counterparts of ^"(Ax) and ^(Ax) are, respectively, the normalised and unnormalised power 
spectra, P n (k) andP(fc). 

The algorithm of the power spectrum for both P"(k) or P(k) in the DWT representation has been 
developed (Pando & Fang 1998; Fang, & Feng 2000; Yang et al. 2001a, 2001b and Jamkhedkar, Bi, & 
Fang 2001). We will study only the unnormalised power spectrum P(k) below. If (F(k)) is constant, the 
normalised and unnormalised power spectra P"(k) and P(k) differ only by a constant factor (F(k)) 2 . In 
this case, all the results forP(fe) also hold for P n (k). Problems with P n (k) caused by a non-constant (F(k)) 
will be discussed in the conclusions. 

The power spectrum in the DWT variables is given by 

Z 1=0 Z 1=0 

The first term on the r.h.s. of eq. (19) is the power of 8(x) on scale j. The second term on the r.h.s. of 
eq. (19) is due to the noise given by 

(e£) 2 = J^WyfitWdx. (20) 

Comparing eq. (19) with eq. (16), we have Pj = S 2 . Therefore, the power spectrum actually is the second 
moment of the PDF of e^. 

It has been shown in general that the DWT power spectrum Pj of a random field 8(x) is related to the 
Fourier power spectrum P(n) of the field as 

Pj = -j I Mn/2J)\ 2 P(n), (21) 

where n is related to the wavenumber by k = 2%n/L. Therefore, the DWT power Pj is a banded Fourier 
power spectrum. Since for D4 wavelet |ty(r|)| 2 nas P ea k at r| ~ 1, the band j is around the wavenumber 
k = 2nn/L~2n2i/L. 

The difference between the DWT mode and the Fourier mode k should be emphasised. For a 
given j, the size of a cell (mode) is L/2 ; , which corresponds to wavenumber k = 2%2i /L. However, the 
wavelet function l|///(x) is localised within L/2 J , and, therefore, the uncertainty relation AxAA: = 2n gives 
Ak = 2;t/(L/2-') = k. That is, a DWT mode with j actually corresponds to a band of Fourier modes k±k/2, 
i.e., the relevant band is Ak/k~ A\nk ~ 1. This banding is optimised in the sense that to detect small scale 
fluctuations (larger wavenumber k), the size of the pieces Ax is chosen to be smaller, while to detect large 
scale fluctuations (smaller wavenumber), the size of the pieces Ax is chosen to be larger. We sometimes 
characterise a DWT scale j by the Fourier mode k. In this case, the k is given by the maximum r\ max of the 
Fourier transform \x\fji (r\)\. 

4 Samples 

4.1 Observed data 

The observational data used in our study consists of 28 Keck HIRES QSO spectra (Kirkman, & Tytler 
1997). The QSO emission redshifts cover a redshift range from 2.19 to 4.11. For each of the 28 QSOs, 
the data are given in form of pixels (z = 1,2, . . .) with wavelength A.,, flux F (kj) and noise a (A,,). The 
noise accounts for the Poisson fluctuations in the photon count, the noise due to the background and the 
instrumentation. The continuum of each spectrum is given by IRAF CONTINUUM fitting. 

For our purpose, the useful wavelength region is from the Lyp emission to the Lya emission, excluding 
a region of about 0.06 in redshift close to the quasar to avoid any proximity effects. In this wavelength 
range, the number of pixels of the data is about 1.2 x 10 4 for each spectrum. Fig. 1 shows the wavelength 
range for 30 QSOs. Since the data of Q0241-0146 has about 44% and, Q1330+0108 has about 27% pixels 
with S/N < 3, we will not use these QSOs in our analysis. We use only the 28 QSO forest samples. 
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For each bin in this data set, the ratio AX/X is constant, i.e., AX/X ~ 13.8 x 10~ 6 , or 8v ~ 4.01 km/s, 
and, therefore, the resolution is about 8 km/s. The distance between N pixels in the units of the local 
velocity scale is given by Av = 2c(l — exp [— {\/2)Nbv/c\) km/s, or wavenumber k = 2%/Av s/km. We use 
only 2 13 = 8192 pixels of each spectrum. Thus, each cell on the DWT scale j corresponds to N — 2 13 ~' 
pixels. 

This data set has been used for the Fourier power spectrum analysis (Croft et al 1999, 2002). They 
concluded that on scales k < 0.15 s/km the contamination of noise is small. In order to easily compare our 
results with the Fourier analysis, we concentrate mainly on the DWT scales j < 10, or bin number N > 8, 
k < 0.2 s/km, or Av > 32 km/s. 

The samples are contaminated by metal lines. It is generally believed that metal lines are narrow with 
a Doppler parameter b < 15 km/s (Rauch, et al 1997; Hu et al 1995). We identified the big spikes on the 
scale - 32 s/km for the ten QSOs (Q0014+8118, Q0054-2824, Q0636+6801, Q0642+4454, Q0940-1050, 
Q1017+1055, Ql 103+6416, Q1422+2309, Q1425+6039, Q1759+7529) and checked if these are related to 
metal lines. To estimate the effect of metal lines, we compare the statistical results of metal-line-removed 
samples with those without removing metal lines. 

In our analysis, we sometimes use the 28 QSO transmissions individually, i.e. calculate the statistics of 
the transmission over each QSO separately, and sometimes all the transmissions are treated together. In the 
latter case, we divide the data into 12 redshift ranges from z = 1 .6 + n x 0.20 to 1.6 + (n + 1) x 0.20 where 
n = 0, . . . , 11. All the transmission flux in a given redshift range forms an ensemble. Note that the number 
of data points in each redshift range is different. 

The mean flux (F) of these samples at z = 1.6 is ~ 0.85, and decreases to ~ 0.5 at z — 3.5. In each 
redshift range, the dispersions of (F) are about 10-15%. 

4.2 Treatment of unwanted data 

Before calculating the DWT power spectrum of observed data, we discuss our method of treating unwanted 
data, including the pixels without data, contamination of metal lines, etc. On an average the S/N ratio of 
the Keck spectra is high. The mean la uncertainty in the flux values F relative to the continuum F c in 
the Lya forest region is 4% on an average (Croft et al 2002). However, for some pixels, the S/N is as 
low as about 1, such as pixels with negative flux. Most of these regions are saturated absorption regions. 
Although the percentage of pixels within these regions is not large, they may introduce large uncertainties 
in the analysis. We must reduce the uncertainty given by low S/N pixels. 

The conventional technique of reducing these uncertainties is eliminating the unwanted bins and smoothly 
rejoining the rest of the forest spectra. However, by taking advantage of the localisation of wavelets we can 
use the algorithm of DWT denoising by thresholding (Donoho 1995) as follows 

1. Calculate the SFCs of both transmission F{x) and noise o(x), i.e. 



where / is a constant. This condition flags all modes with S/N less than /. We can also flag modes 
dominated by metal lines. 

3. Since all the statistical quantities in the DWT representation are based on an average over the modes 
we will skip all the flagged modes while computing these averages. Therefore no rejoining 
and smoothing of the data are needed. 

We call this algorithm the conditional-counting method. It should be pointed out that condition eq. (23) 
is applied on each scale j, and therefore the unwanted modes are flagged on a scale-by-scale basis. If the 




(22) 



2. 



Identify an unwanted mode using the threshold condition 




(23) 
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size of an unwanted data segment is d, condition eq. (23) only flags modes on scales less than or 
comparable to d. We also flag two modes around each unwanted mode to reduce any boundary effects of 
the unwanted chunks. 

Since the DWT calculation assumes that the sample is periodized, this may cause uncertainty at the 
boundary of the sample. To reduce this effect, we drop five modes neighbouring the boundary of the 
sample. With this method, we can still calculate the power spectrum by the estimators of eq. (19), but the 
average is not over all modes /, but over the un-flagged modes only. 

4.3 Testing the DWT denoising method 

At the first glance, the conditional-counting condition eq. (23) seems to preferentially drop modes in the 
low transmission regions, and may lead to an /-dependence of power spectrum. To test this problem, we 
calculate the power spectrum Pj of Q1700+6419. The conditional-counting parameter / is taken to be 
/ = 1,2,3 and 5. The results of Pj vs j are shown in Fig. 2. 

Fig. 2 shows that the power spectrum Pj is independent of the parameter /. For other samples, the 
results are also the same. The reason can be seen from eq. (19), which shows that the contribution to 
the power Pj given by mode is (e^ ; ) 2 — (e";) 2 - The noise subtraction term (e" ; ) 2 guarantees that the 
contribution of modes with small S/N to Pj is always small. For instance, the modes with negative flux, 
i.e., the modes with flux having the same order of magnitude as noise, the two terms (e^) 2 and (e" ; ) 2 
cancel each other statistically. Thus, in the range of / < 5, all the flagged modes always have very small 
or negligible contributions to Pj. Denoising by thresholding is reliable. The thresholding method checks 
variables mode by mode, and therefore, can only be effectively applied for a space-scale decomposed field. 

5 Intermittent features of local power 

5.1 Spikiness of the spatial distribution of local power 
Eq. (19) can be rewritten as 

p >=h% p - 

and 

P,^^) 2 -^) 2 , (25) 

where Nf is the number of modes remaining after applying the denoising condition of eq. (23). Eq. (24) 
shows that the power spectrum Pj is the average of local power P,/. For a given I, the j-distribution of Pji 
is the local power spectrum, i.e., the power spectra in the spatial range of I. For a given j, the /-distribution 
of Pji is the spatial distribution of local power. 

The spatial distribution of local powers is very intuitive for the demonstration of intermittency. Fig. 3 
shows a typical spatial distribution of the local power of the transmitted flux. The left panels of Fig. 3 are 
Pji /Pj for j = 9 and 10 (~ 64 km/s and 32 km/s, respectively). The right panels represent the corresponding 
distribution of the phase-randomised (PR) data. The PR data are obtained by taking the inverse transform 
of the Fourier coefficients of the original data after randomising their phases uniformly over [0, 2n] without 
changing the amplitudes. This gives rise to a PR field with the same unnormalised power spectrum as the 
original field. The mean unnormalised powers of the left and the right panels of Fig. 3 are actually the 
same. Note that, in Fig. 3 large spikes corresponding to metal lines have been removed. 

Two main features can be observed in Fig. 3. First, the local power distributions of Pji/Pj on j = 9 
and 10 (left panels) are significantly different from their counterparts of the PR sample (right panels). The 
former show spiky structures, while the latter are noisy distributions. Therefore, the spikiness arises com- 
pletely from the phase correlation of the Fourier modes. Second, the spiky structures are more significant 
on smaller scales, or larger j. That is, the ratio between the amplitudes of the spikes and the mean power 
is higher on smaller scales. 

Although spikes correspond to a large difference in the flux \F(x + Ax) — F(x)\, they are not always 
related to Lya absorption lines or sharp edges of saturated regions. This is because the strength of spikes is 
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measured by the ratio Pji/Pj- Fig. 2 shows that the mean power of transmission flux fluctuations on j = 10 
is about Pio < 10~ 2 . Thus, even for a spike as high as Pji/Pj ~ 40 at j = 10, the flux difference is only 
\F(x + L/2 l0 )-F(x) \ ~0.6, which is less than (F) =0.7 and does not require either F(x + L/2 10 ) oiF(x) 
to equal zero. Therefore, spikes are different from the Lya absorption lines which correspond to valleys 
identified by a Voigt profile fitting. 

The difference between the spikes and Lya absorption lines can also be seen from Fig. 4, which shows 
the local power Pji on scale j = 10 against the corresponding ^JU jlsF^ on scales j = 10 and j = 11 at 

the same physical position /. Since ^JlJfLsF^ is the flux smoothed on scale j, absorption lines having a 

width of the order of j corresponds to y/U /LeF^ ~ 0. Fig. 4 shows the modes with top 1 % power among 
all modes of the 28 QSOs combined. All these local powers Pji are larger than 10P,, and therefore, they 
are spikes. One can see from Fig. 4 that most spikes are not related to saturated regions with F ~ or 

5.2 PDF of local powers 

By definition, spikes correspond to structures of large \F(x + Ax) —F(x)\. A spiky field indicates the 
excess of large transmission fluctuations compared to a Gaussian distribution. Thus, the non-Gaussianity 
of a spiky field can be described by the PDF or the one-point distribution of local power Pp. 

To calculate the PDF, we use the 12 redshift ranges of the samples mentioned in §4.1. For each redshift 
range, we construct an ensemble consisting of all Pji from the 28 QSOs, for which the position I is in the 
redshift ranges. Fig. 5 plots the PDFs of Pji/Pj on the scale j — 10 in the 12 redshift ranges. 

If the field 8(x) is Gaussian, the PDF of ej ; is also Gaussian. Thus, the PDF of y = Pji/Pj should 
be a % 2 (N — l)(y) distribution, which is also plotted in Fig. 5 (solid line). Comparing to the % 2 (N = 1)- 
distribution, the PDFs of the Keck data are generally higher at Pji/Pj ~ 0, lower at Pji/Pj ~ 1, and again 
higher at Pji/Pj > 10. That is, for most modes the local powers are low or close to zero, while rare modes 
have high power (long-tail). 

In Fig. 5, five of the 12 PDFs have tails as long as Pji/Pj > 32, and six have Pn/Pj > 10. The number 
of long-tails cannot directly be used to measure the degree of the spikiness, as different PDFs of Fig. 5 are 
given by different number of independent modes. Nevertheless, it is clear that all the long-tail events have 
a much larger probability than a Gaussian. 

Fig. 6 shows the PDF of the local power Pji/Pj in the redshift range 2.8 < z < 3.0 on scales j — 8, 9 and 
10, with the parameter in eq. (23), f = 1 (top), f = 3 (middle) and f = 5 (bottom). We see from Fig. 6 that the 
observed PDF is independent of the parameter /. The observed PDF of Pji /Pj at j = 8 is consistent with 
the % 2 distributions, but it is long-tailed for j = 9 and 10. At j = 10, the long-tail is given by Pji/Pj — 32. 
A spike with Pji > 32P 7 - corresponds to an event ~ 5.6a. In Fig. 6, we use 2100 Pji/Pj data points (modes) 
for the statistics on j = 10 and / = 3, and therefore the observed probability of the 5.6a event is about 4x 
10~ 4 . This is much larger than the Gaussian probability of an 5.6a event. 

In contrast to Figs. 5 and 6, Fig. 7 shows the PDF of Pj/ /Pj for a PR sample. As expected the PR field 
follows the % 2 (N = 1) -distribution on small scales (j = 10). 

Some errors in Figs. 5 and 6 might be caused by the 10% dispersion of the mean flux (§4.1). As 
each ensemble in a given redshift range contains local power modes Pji from different QSOs, which have 
different mean flux (F(x)). However, this error will not change the intermittent features. This can be seen 
from Fig. 8, which gives the PDF of Pji/Pj for one sample Q0014+81 18 at 7 = 10. Fig. 8 actually shows 
exactly the same features as in Figs. 5 and 6. 

Moreover, Fig. 8 gives the distribution Pji/Pj for the sample Q0014+81 18 with and without removing 
high spikes located in the regions contaminated by metal lines. The figure shows the existence of a long-tail 
regardless of whether the metal lines and metal line suspects are removed or not. Other quasars searched 
for metal lines show similar results. 

Therefore, the long-tail seems to be a permanent feature of the PDF of local power. It is not caused by 
noise or other contamination. To measure the long-tail we add a lognormal PDF in Fig. 8 given by 
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For this PDF, the mean of y is always equal to 1, i.e., y = 1 . The parameter /j measures the long tail. Larger 
the /j, longer the tail. The best fitting to the observed long-tail is /j ~ 1.5. The observed long-tail is longer 
than lognormal PDFs. The PDF of the flux field 8(x) has a more prominent long-tail than a lognormal field. 

6 Intermittency and the precision of power spectrum 

6.1 Domination of power spectrum by spikes 

As shown by eq. (24), the power spectra, Pj, are given by the mean of local powers over independent 
modes / = l,...,Nf. When the spiky features are pronounced, the power of the transmission fluctuations is 
concentrated in the spikes, and therefore, a big fraction of the power in eq. (24) actually is dominated by 
the spikes. 

To demonstrate the spikiness, we calculate the power spectrum by averaging local power Pji in the 12 
redshift ranges. We also calculate the averages over 12 local power ensembles, but dropping the top 1%, 
3% and 5% of the local power modes. We plot the ratio of power after dropping the highest modes to the 
power without dropping any modes. The result is shown in Fig. 9. 

One can see from Fig. 9 that for most cases, dropping the top 5% modes leads to a decrease in Pj by 
a factor equal to or larger than 2. That is, 50% or even more of the power is given by the top 5% modes. 
On scale j = 10, dropping the top 1% modes leads to a 20% or more decrease in Pj. If the field were 
Gaussian, a top 1% elimination of data would lead to a decrease in Pj of no more than 3%, and a top 5% 
elimination would lead to a decrease of no more than 8%. Therefore, Fig. 9 shows that the power spectrum 
is substantially dependent on the rare events - high spikes. The power is concentrated in the spikes. As a 
consequence, the number of effective modes of the random field is significantly reduced, i.e., only the rare 
modes contribute to the measurement of the power spectrum, while other modes are inactive. This will 
leads to uncertainty in the power spectrum. 

6.2 Uncertainty in the power spectrum of an intermittent field 

Generally, the uncertainty in the power spectrum can be effectively reduced by increasing the number of 
independent modes of measurement. Using the "fair sample hypothesis" (Peebles 1980), we first construct 
an ensemble of samples by dividing the observed sample into a set of N subsamples. We then calculate the 
mean and variance over the ensemble. The precision of the power measurement would then be improved if 
the number is large, as the error is °< y/ 1 /N. If this is always true, then we should divide the sample into 
as many independent subsamples as possible. 

To detect the power of Lya transmitted flux fluctuations on the scale k, the largest possible number N 
is given by the uncertainty relation AxAk ~ 2k. Therefore, we should divide the 1-D flux field (xi ,X2) into 
segments with size Ax > 2n/k. In this case, Ak < k. The segment gives valuable information of the power 
in the band k ± Ak. This is just what the DWT does. The local power Pji, given by DWT, provides the 
largest possible number of modes for detecting power in the band corresponding to k. 

However the precision improvement factor \/l/N is largely based on the central limit theorem. Let us 
consider eq. (24) to be a definition of the stochastic variables Pj constructed from the stochastic variables 
Pji. According to the central limit theorem if all Pji are independent variables, having identical PDFs, then 
Pj will approach a Gaussian PDF when Nf is large, regardless of the PDF of Pj/. Actually, the convergence 
of Pj to a Gaussian variable is fast as Nf increases. This result ensures that the error of power spectrum Pj 
is basically proportional to the Gaussian factor y/l/Nf, even when the field is nonlinear and non-Gaussian. 

Yet, the central limit theorem does not work well with fields having a divergent ratio between its mo- 
ments [eqs.(l) or (2)]. Principally, a superposition of intermittent fields will also converge to a Gaussian 
limit. The error will decrease as \Jl/Nf as required by the central limit theorem. However, this needs a 
large number Nf. This is because the central limit theorem relies on the existence of a unique relationship 
between the PDF and moments (Vanmarcke 1983). The PDF of a lognormal field is not uniquely deter- 
mined by its moments (Crow, & Shimizu 1988). This leads to a very slow convergence of the PDF of Pj to 
the limiting Gaussian PDF as Nf increases. The PDF of Pj is still long tailed if the PDF of Pji is long-tailed, 
like a lognormal field. The number Nf needed for the convergence to a Gaussian Pj can be estimated by 
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Table 1 



N f 


Pj 


Op 


95% confidence 


99% confidence 


99 


0.134 


0.215 (0.250) 


< 0.844 (0.823) 


< 0.999(1.85) 


198 


0.132 


0.231 (0.246) 


< 0.835 (0.810) 


< 1.30(1.82) 


397 


0.120 


0.224 (0.223) 


< 0.827 (0.736) 


< 1.30(1.65) 



(Barakat, 1976) as, 

Nf~yi, (27) 

where yi = (e^ — l) 1 / 2 (e^ + 2) is the skewness of the PDF eq. (26). Thus, for /j ~ 1.5 (§5.2), we have 
Nf ~ 2.1 x 10 4 . Therefore, the central limit theorem is less effective for an intermittent field. For the PDF 
shown in Fig. 8, the precision improvement factor y/l/N would not work until Nf is not large enough as 
required by eq. (27). 

To demonstrate this property, we calculate the mean power and its l-o of Q0014+8118 on the scale 
j = 9. The power is calculated by eq. (24), and the 1-a by 



Op = 



w-^ f (p "- Pj) 



(28) 



In the redshift range 2.7 - 3. 12, there are 397 local power modes Pji available. These powers are indepen- 
dent in the sense that the cross correlations (e^ej ; ,} ~ if / ^ I' (Pando, Feng, & Fang 2001). However, 
the PDF of these Pji are close to eq. (26). The results are given in Table 1. Here Nf means that we use only 
the first Nf local power modes Pji out of the total 397 modes. Table 1 shows that Op does not show the 
y/ 1 /^/-dependence. It is almost independent of Nf. On the other hand, we find the PR samples do show 
a decrease in the error by factor 1 / y/Wf. 

The numbers in the bracket of Table 1 are calculated by using the PDF in eq. (26), which gives 

Op=Pj{e^-\) l l 2 , 95%Con.=P 7 e 1% ^ 2 / 2 , 99% Con. = P,-* 2 - 58 "-" 2 / 2 . (29) 

We use fj = 1.5. All the observed results can roughly be fitted by the lognormal PDF eq. (26). That is, 
regardless of Nf in the range 99 - 397, the PDF of Pj are long-tailed. Again, with the phase randomised 
samples, the PDF of Pj is Gaussian, and with a variance following the factor 1 / y/Wf. 

Therefore, the variance Op doesn't effectively decrease with the increase of the number Nf when Nf is 
the order of ~ 10 2 . Table 1 shows also that the values of Op generally are large. This is directly due to the 
intermittency, or the PDF eq. (26). 



6.3 Error estimation by subsamples with optimal size 

To reduce the uncertainty or l-o error of the power spectrum of the flux fields, a popular method for 
estimating the error of power spectrum of the flux field is based on subsamples or segments with optimally 
selected size. For instance, Croft et al. (2002) used the jackknife error estimator, which is found by dividing 

" 2 

the sample into N subsamples, and computing the l-o error bars by a P = [(l/A0Lv(P - P) ] 1/2 , where 
P is the mean power from the full data samples and P, is the mean power estimated by leaving out the 
subsample ;'. They used N = 5 as the optimal number. McDonald et al. (2000) used a modified bootstrap 
error estimator, in which they have re-sampled the data not pixel-by-pixel, but segment-by-segment. Each 
segment has a size of 100 pixels which is also used as an optimal number. 

The l-o error given by these methods is indeed much less than the Op listed in Table 1. We need 
to understand why they yield smaller error bars? What is the criterion for selecting the optimal size of 
subsamples and segments? In our view, these problems are not all technical, but rather depend on the 
intermittent nature of the Lya flux field. 
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Table 2 





N 


M t 




Op 


95% Con. 


99% Con. 


1 


397 


1 


0.120 


0.224 


< 0.827 


< 1.30 


2 


40 


9 


0.123 


0.073 


< 0.233 


< 0.282 


3 


20 


19 


0.121 


0.044 


< 0.208 


< 0.215 


4 


10 


39 


0.121 


0.015 




< 0.143 


5 


5 


79 


0.120 


0.015 




< 0.143 



To demonstrate these problems, we analyse the local power Pji modes. The jackknife estimator is 
equivalent to dividing a total of Nf local power modes (Pji) into N groups (subsamples). Each group 
contains M modes, i.e., Nf = NM. The 1 — o uncertainty is calculated by 



Op = 



1 l/ 2 



(30) 



and 



/(0 

where the subscript and superscript i is for the group (subsample) = 1, . . . ,N. The summation runs over 
modes I in the group i. 

Using eqs. (30) and (31), we again calculate the mean and Op of the j = 9 power of Q0014+8118. 
The result is listed in Table 2. Line 1 of Table 2 is the same as that of Table 1. It is for N = 397, and 
Mj ; = 1. The lines 2 through 5 are obtained by dividing the 397 local power modes (Pji) into 40, 20, 10 and 
5 subsamples, respectively. When the size of subsamples is 9 and 19 (lines 2 and 3), Op is still large, and 
the numbers for 95% Con. and 99% Con. are also larger than 2<5p and 2.5<5p. Therefore, the PDF of P, is 
still long-tailed. In the lines 4 and 5, we didn't show the 95% Con. level, because the total number N of 
the subsamples is only 10 or 5, and is too small to measure the PDF with a resolution < 10%. The value of 
99% Con. indicates that the PDF of P, in the lines 4 and 5 probably is no longer long-tailed. The value of 
Op for lines 4 and 5 is much less than lines 1 and 2. Thus, with Table 2, the optimal size of the subsamples 
probably seems to be M ~ 40 or 80 or N = 10 or 5. We also calculated powers on other scales and found 
that they have the similar behaviour as the Table 2. Therefore, the criterion for the optimal size M might 
be that the PDF of Pj is no longer long-tailed. 

From Table 2, we see that the error estimator of subsamples or segments with optimal size calculates the 
dispersion of powers among the subsamples and segments only, and therefore, it actually is not a measure 
of the dispersion among all independent modes of the data. If two subsamples have the same mean power 
P„ but very different distributions of Pji, their contribution to Op (eq. (30)) is the same. Therefore, Op does 
not measure the uncertainty caused by the difference between the intermittent subsamples. Only when 
the two subsamples are Gaussian, their statistical properties would be the same if they have the same Pj 
(second moment). Therefore, the error estimator with subsamples and segments implicitly assumes that the 
subsamples are Gaussian. Eq. (30) with the optimal size subsamples does not give the error of the originally 
intermittent field, but the corresponding Gaussian subsamples having the same P, as the intermittent field. 
We have checked the PDF of Pji of subsamples in lines 4 and 5 of Table 2. There are substantially non- 
Gaussian. 

Therefore, the improvement in the error estimation using optimal size subsamples or segments essen- 
tially is to replace an intermittent field within each subsamples with a Gaussian field. The intermittent fea- 
tures of the field are overlooked. Moreover, these errors are parameter (optimal size) dependent. According 
to eq. (1), the spikiness is scale-dependent, and therefore, the optimal size given by Table 2 generally is 
scale-dependent. One parameter option cannot fit the optimal size for all scales. 
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Table 3 



N f 


N B 


Pj 


Op 


95% Con. 


99% Con. 


397 


397 


0.120 


0.0110 


< 0.140 


< 0.153 


198 


198 


0.119 


0.0159 


< 0.145 


< 0.162 


99 


99 


0.120 


0.0231 


< 0.167 


< 0.185 



6.4 Error estimation without subsamples 

The analysis of last subsection shows that it is necessary to have an error estimator for the Lya flux field, 
that does not require any extra parameters like the size of subsamples, and is not insensitive to the intermit- 
tency of the field. This can be done with the local power Pji. Let us consider eqs. (24) and (28) once again. 
As mentioned in §6.2, the Nf data of Pji can be considered independent. They can be used as the parent 
sample for bootstrap resampling. That is, we generate Nf samples having Nf modes by drawing Pji from 
the parent sample with replacement. We then calculate the mean and l-o over Nf realizations. 

With this method, we again calculate the mean and Op on j = 9 for Q0014+8118. First, Fig. 10 plots 
the PDF of the parent data of Pji (j = 10) of Ql 103, and two given by bootstrap realizations. It shows that 
the PDF of the synthetic data P^ of each realization is the same as the original Pji, i.e., it is long-tailed. 
Therefore, the ensemble of the realizations from the bootstrap resampling contains all information of the 
intermittency. 

The results of the mean and Op are listed in Table 3. Here Nb is the number of samples given by 
bootstrap re-sampling. We see, all the results are very stable. Except for Nf — 99, the relations among Op, 
95% Con. and 99% Con. for Nf = 397 and 198 are close to Gaussian. This is because the total number of 
modes used in the estimation of Table 3 is of the order of > 10 2 x 10 2 ~ 10 4 , which is comparable with the 
number required by the central limit theorem (§6.2). The case of Nf = 99 is not large enough, and gives a 
much larger Op. Therefore, this method illustrates a slow convergence of Pj to the Gaussian limit due to 
intermittency. 

As a final result, we show in Fig. 1 1 the mean power Pj and their error bars on scales j = 8, 9 and 10 
given by the estimator of the bootstrap resampling developed above. The powers in Fig. 1 1 are shown in 
each redshift bin of the Keck data. As the field is highly non-Gaussian, we use confidence level to describe 
the uncertainty range. The error bars are the 99% confidence range of the bootstrap resampling. The error 
bars are independent of the parameter /, which is taken to be 1 (circle), 3 (triangle) and 5 (pentagon). 

Fig. 1 1 shows clearly the effect of intermittency. The error bars at z — 1.7 and 1 .9 are much larger than 
others, because these samples are not large enough. For redshift bins, for which the number of independent 
modes matches the condition of the CLT convergence, this bootstrap resampling method effectively reduces 
the uncertainty in the power spectrum without introducing extra parameters. 

7 Discussion and conclusions 

The most popular statistical measure in large scale structure study is the power spectrum. For Gaussian 
fields, there are many effective and successful algorithms of calculating the power spectrum and estimat- 
ing their errors. The power spectrum in the nonlinear regime is also important not only for constraining 
cosmological parameters, but determining the initial conditions for the simulations of galaxy formation. 
However, the effects of non-Gaussianity upon the power spectrum have not been fully studied yet. 

We studied the effect of the non-Gaussianity of the field of QSO Lya transmitted flux fluctuations 
on the estimation of their power spectrum and errors. The flux field is intermittent. Generally speaking, 
intermittency poses problems in the detection of power spectrum, as a large fraction of the power of the 
transmission fluctuations is concentrated in rare and improbable events: high spikes. Thus, the power 
spectrum is sensitive to small probability events. Therefore, the dispersion among the different spatial 
regions is large. This property directly challenges the application of the "fair sample hypothesis" (Peebles 
1980) which assumes that a part of the universe is a fair sample of the whole and ensemble averages can 
be calculated by spatial averages. That is, the spatial average will not converge to the ensemble averages 
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if the dispersion among the different spatial regions is large. Mathematically, this is shown by the slow 
convergence of Pj to the Gaussian limit required by the central limit theorem, if the PDF of the field is 
long-tailed due to intermittency. 

To reasonably estimate the uncertainty of the power spectrum, we should carefully analyse the conver- 
gence of the data set considered, to the CLT. With this result, we show that some conventional methods 
essentially estimate the errors by the dispersion of the powers among subsamples, ignoring the disper- 
sion among all the independent modes of the data. They do not measure the error of the non-Gaussian or 
intermittent field, but a Gaussian field. The error given by this method is parameter-dependent. 

With analysis of the CLT convergence, we proposed an error estimator for the power spectrum. It is 
based on a bootstrap resampling among the local powers of all the modes. This estimation doesn't need any 
extra parameter like the size of subsamples, and doesn't ignore the intermittency of the fields. The powers 
and errors for 28 transmitted flux samples are calculated with this method. This result shows the effect of 
intermittency on power spectrum, and gives a more effective estimation of errors than the "optimal" size 
method, especially when the number of independent modes matches the condition of the CLT convergence. 

We studied in this paper only the unnormalised power spectrum eq. (18), but not the normalised power 
spectrum eq. (17). If the normalisation is given by a constant (F(k)) 2 , all the results for unnormalised 
power spectrum hold for the normalised power spectrum. If the background (F(k)) 2 is not constant but 
position dependent, the local fluctuations will couple with the background when the field is non-Gaussian 
(Jamkhedkar, Bi, & Fang, 2001). This case should be studied in detail for the detection and error estimation 
of the normalised power spectrum. 
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Figure 1 : Redshift range of the transmission flux of the 30 Keck QSO absorption spectra. 
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Figure 2: The DWT power spectra Pj of Q1700+6419. The parameter / for the conditional counting is 
taken to be 1, 2, 3 and 5. From §4.1, the scale j corresponds to Av = 2c{l — exp[— (l/2)2 13_ - 7 8v/c]} ~ 
2 13 -'8v, and k = 2%/2 13 -J8v s/km. 
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Figure 3: A section of the spatial distribution of local powers, Pji /Pj on scales j = 9 and 10 (corresponding 
to <~ 64 and 32 km/s) for the sample Ql 103+6416. The real field is shown in the left panels and the right 
panels represent the PR field. Large spikes associated with metal lines have been removed. 
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Figure 4: The top 1% local powers on scale j = 10 and their corresponding smoothed fluxes (SFCs) on 
scales J = ll (top panel) and 10 (bottom panel). 
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Figure 5: The PDFs of Pji /Pj of real samples on the scale j = 10 in 12 redshift ranges z= 1.6 + nx 0.2 to 
1.6 + (n + 1) x 0.2, n = 0,. ..,11. The solid lines are the % 2 (N= 1) distribution. 



19 



-1 1 
logio P 8 ./Pb 



-1 1 

lOglO P 9 ,/P S 



-4 



0.5 1 1.5 2 
lo Sio P ioi/ p io 



Q 
Oh 



Of) 
O 




-1 1 

»Ogl0 P 8./P 8 



i i i i I i i- 




-1 1 

lOg 10 P 9 ./P 9 



.1 I I I I I I I I I x 




I I I I I I III I I I- 



1 2 
lo Sio P ioi/ P io 




-1 1 

lQ glO P 8 l/ P 8 




I I I I I I I I I VFT I 



I 1 1 1 I 1 1 I I I I | 1 1 I 1 1 I 14+ 



■1 1 
log 10 P 9 ,/P, 




OCQD 

1 1 1 1 1 1 1 1 1 1 1 1 Hi i 1 1 1 1 1 1 



0.5 1 1.5 2 
lo Sio P io/ P io 



Figure 6: The PDF of Pji jPj of real samples on scales j = 8, 9 and 10 in redshift range z = 2.8 to 3.0. The 
parameter / is taken to be 1 (top), 3 (middle) and 5 (bottom). The solid lines are the % 2 (N = 1) distribution. 
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logic ( P io 1/P10) 

Figure 8: The PDF of P n /Pj for Q0014+8 1 18 and / = 3 without the removal of metal lines (squares), and 
with the removal of metal lines and metal line suspects from the big spikes (crosses). The solid line is the 
% 2 (N — 1) distribution. The dotted line is the lognormal distribution (eq. (26)) with/j = 1.5. 
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Figure 9: Ratio of power (Pji)d with top modes dropped to the total power vs. redshift of real data on scales 
j = 8, 9 and 10. Square, pentagon and hexagon are, respectively, the ratio of powers when the top 1%, 3% 
and 5% data are dropped. 
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logic P j / p j 

Figure 10: The PDF of Pji /Pj for Ql 103 on scale j = 9. Triangles are for real data. Squares and crosses 
correspond to two bootstrap realizations. The solid line is the % 2 (N = 1) distribution. The dotted line is the 
lognormal distribution (eq. (26)) with n = 1.5. 
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Figure 11: The power spectrum, Pj on scales j = 8, 9 and 10, in each redshift bin of the Keck data. Error 
bars are the 99% confidence given by the bootstrap resampling. The parameter / is taken to be 1 (circle), 
3 (triangle) and 5 (pentagon). 
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